Method and apparatus for low-complexity spatial scalable decoding

ABSTRACT

A video decoder and method for low-complexity spatial scalable video are disclosed, the decoder for receiving compressed high-resolution scalable and standard-resolution bitstreams and providing high-resolution video, and including an I-picture detector for receiving the compressed standard-resolution bitstream, a standard-resolution Intra decoder coupled with the I-picture detector for decoding I-pictures, a high-resolution video decoder for receiving the compressed high-resoluction scalable bitstream, and a selector coupled with the standard-resolution Intra video decoder and the high-resolution video decoder for selecting between the outputs from the standard-resolution Intra video decoder and the high-resolution video decoder to provide the high-resolution video sequence.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser.No. 60/1479,734 (Attorney Docket No. PU030166), filed Jun. 19, 2003 andentitled “METHOD AND APPARATUS FOR LOW COMPLEXITY SPATIAL SCALABLEENCODING AND DECODING”, which is incorporated herein by reference in itsentirety.

FIELD OF THE INVENTION

The present invention is directed towards video coders and decoders(CODECs), and more particularly, towards an apparatus and method forspatial scalable encoding and decoding.

BACKGROUND OF THE INVENTION

Broadcast video service providers currently use MPEG-2 to transmitstandard definition (“SD”) video programs. In the future, a transitionto high definition (“HD”) using the JVT/H.264/MPEG AVC (“JVT”) standardis anticipated. Simulcasting of both an MPEG-2 SD program and a JVT HDversion of the same program requires more bandwidth than if a scalableapproach were used. However, scalable encoders and decoders aresignificantly more computationally complex than are non-scalableencoders and decoders.

Many different methods of scalability have been widely studied andstandardized in the scalability profiles of the MPEG-2 and MPEG-4standards, including SNR scalability, spatial scalability, temporalscalability, and fine grain scalability. Scalable coding has not beenwidely adopted in practice, however, because of the considerableincrease in complexity for implementing scalable encoders and decoders.

Spatial scalable encoders and decoders typically require that thehigh-resoultion scalable encoder/decoder provide functionality inaddition to what would be present in a non-scalable high-resolutionencoder/decoder. In an MPEG-2 spatial scalable encoder, a decision ismade whether prediction is performed from a standard-resolution or ahigh-resolution reference picture. An MPEG-2 spatial scalable decoder iscapable of predicting from either the standard-resolution picture or thehigh-resolution picture. Two sets of reference picture stores are usedby an MPEG-2 spatial scalable encoder/decoder, one forstandard-resolution pictures and another for high-resolution pictures.

Accordingly, what is needed is a reduced-complexity spatial scalableencoder/decoder capable of supporting both SD and HD versions of thesame program over limited-bandwidth connections.

SUMMARY OF THE INVENTION

These and other drawbacks and disadvantages of the prior art areaddressed by an apparatus and method for low-complexity spatial scalabledecoding.

The decoder, for receiving compressed high-resolution scalable andstandard-resolution bitstreams and providing high-resolution video,includes an I-picture detector (464) for receiving the compressedstandard-resolution bitstream, a standard-resolution Intra decoder (466)coupled with the I-picture detector for decoding I-pictures, ahigh-resolution video decoder (482) for receiving the compressedhigh-resolution scalable bitstream, and a selector (486) coupled withthe standard-resolution Intra video decoder and the high-resolutionvideo decoder for selecting between the outputs from thestandard-resolution Intra video decoder and the high-resolution videodecoder to provide the high-resolution video sequence.

These and other aspects, features and advantages of the presentinvention will become apparent from the following description ofexemplary embodiments, which is to be read in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood in accordance with thefollowing exemplary figures, in which:

FIG. 1 shows a block diagram for a relatively high-complexity spatialscalable encoder;

FIG. 2 shows a block diagram for a relatively high-complexity spatialscalable decoder;

FIG. 3 shows a block diagram for a low-complexity spatial scalableencoder in accordance with principles of the present invention; and

FIG. 4 shows a block diagram for a low-complexity spatial scalabledecoder in accordance with principles of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Embodiments of the presently disclosed invention provide a method andapparatus for low-complexity, generally low-cost, spatial scalableencoding and decoding. In the description that follows, an encoder anddecoder may be collectively referred to as a CODEC for purposes ofsimplicity, although method and apparatus embodiments may be capable ofonly encoding, only decoding, or both encoding and decoding.

In accordance with the principles of the invention, a low-complexityspatial scalable CODEC utilizes non-scalable encoder and/or decoderblocks. The term “normal” may be used herein and/or in the drawings torefer to generally non-scalable as opposed to specifically scalableelements and/or features of higher complexity, and shall specificallynot imply that the element and/or feature is necessarily conventional.

In the instant embodiment of the present invention, Intra-coded (I)pictures are scalably coded using a spatial scalability technique, whilenon-intra coded (P and B) pictures are encoded non-scalably. Thehigh-resolution input image is down-sampled to form astandard-resolution image, and the standard-resolution image is encodedand decoded using a non-scalable encoder/decoder. The decoded image isup-sampled, and then subtracted from the input high-resolution image.The difference between the high-resolution image and the up-sampledstandard-resolution image is then encoded using a non-scalable encoder.At the decoder end, only I-coded standard-resolution pictures aredecoded using a non-scalable decoder, then they are up-sampled and addedto the decoded high-resolution difference signal, to form thehigh-resolution output pictures. Non I-coded high-resolution picturesare decoded non-scalably.

Thus, in the instant embodiment of the present invention, spatialscalable encoding/decoding is performed only for Intra-coded pictures orslices, and non-scalable encoding/decoding for non-intra coded picturesor slices. Scalable encoding provides a significant coding efficiencyadvantage as compared to simulcast for intra-coded (I) pictures, butless of an advantage for inter-coded (B and P) pictures. The complexityof a spatial scalable encoder and decoder can be considerably reduced byusing scalability techniques only in intra-coded pictures, whileretaining much of the coding efficiency advantages.

In accordance with the principles of the present invention,scalability-capable video encoder and decoder modules are not required.Instead non-scalable high-resolution encoders and decoders can be usedin this system, in conjunction with additional functional blocks. Thestandard resolution and high-resolution encoders and decoders may complywith any video compression standard, such as MPEG-2, MPEG-4, or H.264.For example, the standard-resolution encoder and decoder may bestandards-compliant MPEG-2 Main Profile, and the high-resolution encoderand decoder may be standards-compliant H.264 encoders and decoders.Other combinations may also be considered, as would be apparent to thoseskilled in the art.

The present description illustrates the principles of the invention. Itwill thus be appreciated that those skilled in the art will be able todevise various arrangements that, although not explicitly described orshown herein, embody the principles of the invention and are includedwithin its spirit and scope.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the invention and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, andembodiments of the invention, as well as specific examples thereof, areintended to encompass both structural and functional equivalentsthereof. Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

Thus, for example, it will be appreciated by those skilled in the artthat the block diagrams presented herein represent conceptual views ofillustrative circuitry embodying the principles of the invention.Similarly, it will be appreciated that any flow charts, flow diagrams,state transition diagrams, pseudocode, and the like represent variousprocesses which may be substantially represented in computer readablemedia and so executed by a computer or processor, whether or not suchcomputer or processor is explicitly shown.

The functions of the various elements shown in the figures may beprovided through the use of dedicated hardware as well as hardwarecapable of executing software in association with appropriate software.When provided by a processor, the functions may be provided by a singlededicated processor, by a single shared processor, or by a plurality ofindividual processors, some of which may be shared. Moreover, explicituse of the term “processor” or “controller” should not be construed torefer exclusively to hardware capable of executing software, and mayimplicitly include, without limitation, digital signal processor (“DSP”)hardware, read-only memory (“ROM”) for storing software, random accessmemory (“RAM”), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included.Similarly, any switches shown in the figures are conceptual only. Theirfunction may be carried out through the operation of program logic,through dedicated logic, through the interaction of program control anddedicated logic, or even manually, the particular technique beingselectable by the implementer as more specifically understood from thecontext.

In the claims hereof, any element expressed as a means for performing aspecified function is intended to encompass any way of performing thatfunction including, for example, a) a combination of circuit elementsthat performs that function or b) software in any form, including,therefore, firmware, microcode or the like, combined with appropriatecircuitry for executing that software to perform the function. Theinvention as defined by such claims resides in the fact that thefunctionalities provided by the various recited means are combined andbrought together in the manner which the claims call for. Applicant thusregards any means that can provide those functionalities as equivalentto those shown herein.

As shown in FIG. 1, a standard-complexity spatial scalable encodersupporting two layers is indicated generally by the reference numeral100. The encoder 100 includes a downsampler 110 for receiving ahigh-resolution input video sequence. The downsampler 110 is coupled insignal communication with a standard-resolution non-scalable encoder112, which, in turn, is coupled in signal communication withstandard-resolution frame stores 114. The standard-resolutionnon-scalable encoder 112 outputs a standard-resolution bitstream, and isfurther coupled in signal communication with a standard-resolutionnon-scalable decoder 120.

The standard-resolution non-scalable decoder 120 is coupled in signalcommunication with an upsampler 130, which, in turn, is coupled insignal communication with a scalable high-resolution encoder 140. Thescalable high-resolution encoder 140 also receives the high-resolutioninput video sequence, is coupled in signal communication withhigh-resolution frame stores 150, and outputs a high-resolution scalablebitstream.

Thus, a high resolution input video sequence is received by thestandard-complexity encoder 100 and down-sampled to create astandard-resolution video sequence. The standard-resolution videosequence is encoded using a non-scalable standard-resolution videocompression encoder, creating a standard-resolution bitstream. Thestandard-resolution bitstream is decoded using a non-scalablestandard-resolution video compression decoder. (This function may beperformed inside of the encoder.) The decoded standard-resolutionsequence is up-sampled, and provided as one of two inputs to a scalablehigh-resolution encoder. The scalable high-resolution encoder encodesthe video to create a high-resolution scalable bitstream.

Turning to FIG. 2, a standard-complexity spatial scalable decodersupporting two layers is indicated generally by the reference numeral200. The spatial scalable decoder 200 includes a standard-resolutiondecoder 260 for receiving a standard-resolution bitstream, which iscoupled in signal communication with standard-resolution frame stores262, and outputs a standard-resolution video sequence. Thestandard-resolution decoder 260 is further coupled in signalcommunication with an upsampler 270, which, in turn, is coupled insignal communication with a scalable high-resolution decoder 280.

The scalable high-resolution decoder 280 is further coupled in signalcommunication with high-resolution frame stores 290. The scalablehigh-resolution decoder 280 receives a high-resolution scalablebitstream and outputs a high-resolution video sequence.

Thus, both a high-resolution scalable bitstream and standard-resolutionbitstream are received by the standard-complexity decoder 200. Thestandard-resolution bitstream is decoded using a non-scalablestandard-resolution video compression decoder, which utilizesstandard-resolution frame stores. The decoded standard-resolution videois up-sampled, and then input into a high-resolution scalable decoder.The high-resolution scalable decoder utilizes a set of high-resolutionframe stores, and creates the high-resolution output video sequence.

As shown in FIG. 3, a low-complexity spatial scalable encoder supportingtwo layers is indicated generally by the reference numeral 300. Theencoder 300 includes a downsampler 310 for receiving a high-resolutioninput video sequence. The downsampler 310 is coupled in signalcommunication with a standard-resolution non-scalable encoder 312,which, in turn, is coupled in signal communication withstandard-resolution frame stores 314. The standard-resolutionnon-scalable encoder 312 outputs a standard-resolution bitstream, and isfurther coupled in signal communication with a standard-resolutionnon-scalable Intra decoder 322.

The non-scalable standard-resolution Intra decoder 322 is coupled insignal communication with an upsampler 330, which, in turn, is coupledin signal communication with each of an inverting input of a firstsumming unit 342 and a non-inverting input of a second summing unit 344.The first summing unit 342 has a non-inverting input for receiving thehigh-resolution input video sequence, and has an output coupled insignal communication with a selector 346. The selector 346 also has aninput for receiving the high-resolution input video sequence, as well asa third input for receiving an I-slice/I-picture indicator from thestandard-resolution non-scalable encoder 312. The selector 346 iscoupled in signal communication with a non-scalable high-resolutionencoder 348. The non-scalable high-resolution encoder 348 is foroutputting a high-resolution scalable bitstream, and is coupled insignal communication with a non-inverting input of the summing unit 344.The non-scalable high-resolution encoder 348 is further coupled insignal communication with frame stores 350. The frame stores 350 arecoupled in signal communication with an output of the summing unit 344.

Thus, the low-complexity spatial scalable encoder embodiment 300receives a high-resolution input video sequence. The sequence isdown-sampled to create a standard-resolution video sequence. Thestandard-resolution video sequence is encoded using a non-scalablestandard-resolution encoder, creating a standard-resolution bitstream.The Intra-coded (I) pictures are decoded using a non-scalablestandard-resolution decoder. Alternatively, this function may beperformed as a ancillary function within the encoder itself. The decodedstandard-resolution I pictures are up-sampled, and subtracted from theinput video pictures. An offset (for example −128), may optionally beadded to the difference, to maintain pixel values in the range of [0,255]. These difference pictures are then input to a non-scalablehigh-resolution video compression encoder. The up-sampledstandard-resolution decoded I pictures are added to the high-resolutionencoded difference signal, with optional offset, before storage in thehigh-resolution frame stores. This allows a correct reference picture tobe used in subsequent non-scalable coding of P and B pictures. For thenon-I pictures (P and B), the input video sequence pictures are input tothe non-scalable high-resolution video encoder, and encodednon-scalably.

Turning to FIG. 4, a low-complexity spatial scalable decoder supportingtwo layers is indicated generally by the reference numeral 400. Thelow-complexity spatial scalable decoder 400 includes an I-picturedetector/selector 464 for receiving a standard-resolution bitstream,which is coupled in signal communication with a standard-resolutionIntra decoder 466. The standard-resolution Intra decoder 466 is coupledin signal communication with an upsampler 470, which, in turn, iscoupled in signal communication with a first non-inverting input of asumming unit 484. The standard-resolution Intra decoder 466 is furthercoupled in signal communication with a first input of a selector 486 forproviding an intra-coding indicator to the selector 486.

The low-complexity spatial scalable decoder 400 further includes anon-scalable high-resolution decoder 482 for receiving a high-resolutionscalable bitstream. The high-resolution decoder 482 is coupled in signalcommunication with each of a second non-inverting input of the summingunit 484, a second input of the selector 486, and high-resolution framestores 490. The summing unit 484 has an output coupled in signalcommunication with a third input of the selector 486. The selector 486outputs a high-resolution video sequence, and is coupled in signalcommunication with the high-resolution frame stores 490.

Thus, the low-complexity spatial scalable decoder embodiment 400includes an I-picture selector/detector that searches the receivedstandard-resolution bitstream and removes all non-I picture coded data.It may identify I-picture data by searching for picture start codes inthe bitstream, and decoding the picture coding type from the pictureheader. A non-scalable standard resolution Intra decoder then decodesthe I-picture data. An Intra only decoder such as this is ofconsiderably lower complexity than a full video compression decoder, anddoes not require standard-resolution reference frame stores. The decodedstandard-resolution Intra pictures are up-sampled.

The high-resolution scalable bitstream is input to a non-scalablehigh-resolution decoder. For non-I pictures, its output is selected asthe output high-resolution video sequence. For I pictures, thehigh-resolution decoded output is added to the up-sampled standardresolution decoded I pictures, which is selected to form the outputhigh-resolution video sequence. For scalable I pictures, the outputhigh-resolution video picture is stored in the reference frame store,rather than the output of the non-scalable high-resolution decoder.

While the non-scalable high resolution decoder and standard-resolutionintra decoder are shown as separate boxes in the block diagram, a singlemultifunction decoder could be used to perform both functions. Becauseintra decoding is generally much less complex than inter decoding, if ageneral purpose processor is used, it may be utilized to perform boththe standard resolution intra picture decode and high resolution intrapicture decode during the same time period as would be required toperform a high resolution inter picture decode.

In the H.264 video coding standards, individual slices in the samepicture may be coded using different prediction types. For example, apicture may contain both an I slice and a P slice. If H.264 is used forboth the high resolution and standard resolution encoding in thisinvention, scalability may be performed on I slices rather than Ipictures, with the requirement that the macroblocks corresponding to theI slices of the up-sampled standard resolution picture are also coded asI slices. The I-picture picture detector/selector would become anI-slice detector/selector, in this embodiment.

If MPEG-2, or another coding standard which requires that all slices inthe same picture be coded using the same prediction type, is used in thestandard resolution layer, and H.264 is used in the high resolutionlayer, the selection of whether or not scalability is applied isdependent on the picture coding type used in the standard resolutionlayer. I-slices may be coded in the high resolution H.264 layer even ifthe corresponding MPEG-2 standard-resolution layer is not an I-picture,but scalability is not applied.

Various methods can be used for the upsampler and downsampler functions,including bi-linear interpolation, or multi-tap interpolation anddecimation filters, as are well known to those skilled in the art.

The high resolution video sequence pictures may contain data notrepresented by the standard resolution video sequence pictures, forexample if the high resolution pictures have a 16:9 aspect ratio and thestandard resolution pictures have a 4:3 aspect ratio. In that case, theup-sampling function can set to a value of zero for those pixels that donot correspond to pixels present in the standard-resolution picture.

These and other features and advantages of the present invention may bereadily ascertained by one of ordinary skill in the pertinent art basedon the teachings herein. It is to be understood that the principles ofthe present invention may be implemented in various forms of hardware,software, firmware, special purpose processors, or combinations thereof.

Most preferably, the principles of the present invention are implementedas a combination of hardware and software. Moreover, the software ispreferably implemented as an application program tangibly embodied on aprogram storage unit. The application program may be uploaded to, andexecuted by, a machine comprising any suitable architecture. Preferably,the machine is implemented on a computer platform having hardware suchas one or more central processing units (“CPU”), a random access memory(“RAM”), and input/output (“I/O”) interfaces. The computer platform mayalso include an operating system and microinstruction code. The variousprocesses and functions described herein may be either part of themicroinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU. In addition,various other peripheral units may be connected to the computer platformsuch as an additional data storage unit and a printing unit.

It is to be further understood that, because some of the constituentsystem components and methods depicted in the accompanying drawings arepreferably implemented in software, the actual connections between thesystem components or the process function blocks may differ dependingupon the manner in which the present invention is programmed. Given theteachings herein, one of ordinary skill in the pertinent art will beable to contemplate these and similar implementations or configurationsof the present invention.

Although the illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious changes and modifications may be effected therein by one ofordinary skill in the pertinent art without departing from the scope orspirit of the present invention. All such changes and modifications areintended to be included within the scope of the present invention as setforth in the appended claims.

1. A spatial scalable video decoder for receiving each of astandard-resolution bitstream and a high-resolution scalable bitstreamand providing a high-resolution video sequence, the decoder comprising:an I-picture detector for receiving the standard-resolution bitstream; astandard-resolution Intra decoder in signal communication with theI-picture detector for decoding I-pictures; a high-resolution videodecoder for receiving the high-resolution scalable bitstream; and aselector in signal communication with the standard-resolution Intravideo decoder and the high-resolution video decoder for selectingbetween the outputs from the standard-resolution Intra video decoder andthe high-resolution video decoder to provide the high-resolution videosequence.
 2. A decoder as defined in claim 1, further comprising anI-picture indicator in signal communication between thestandard-resolution Intra decoder and the selector.
 3. A decoder asdefined in claim 1, further comprising an I-picture selector in signalcommunication with the I-picture detector.
 4. A decoder as defined inclaim 1, further comprising an upsampler in signal communication withthe standard-resolution Intra decoder.
 5. A decoder as defined in claim1, further comprising a summing unit in signal communication with thehigh-resolution decoder.
 6. A decoder as defined in claim 1, furthercomprising high-resolution frame stores in signal communication with thehigh-resolution decoder.
 7. A decoder as defined in claim 6 wherein thehigh-resolution resolution frame stores is in signal communication withthe selector for receiving the high-resolution video sequence.
 8. Adecoding method for providing spatial scalable decoded video data, themethod comprising: receiving a standard-resolution bitstream; receivinga high-resolution scalable bitstream; Intra decoding I-pictures from thestandard-resolution bitstream; up-sampling the decoded I-picture tohigh-resolution; high-resolution decoding a current picture from thehigh-resolution scalable bitstream; and summing the decoded currentpicture with the up-sampled I-picture.
 9. A decoding method as definedin claim 8, further comprising: selecting one of the decoded currentpicture and the summed picture in response to an indication of thepresence of an I-picture; and outputting the selected picture in ahigh-resolution video sequence.